Commit Graph

158 Commits

Author SHA1 Message Date
ThetaDev 417b79757f Merge branch 'dev' of github.com:TeamNewPipe/NewPipeExtractor into channel-tabs 2023-04-27 11:56:50 +02:00
ThetaDev 47aa9fed40 fix: set musicClientVersion regex capture group 2023-04-16 19:25:05 +02:00
ThetaDev f306db0178 refactor: move YT channel utils to YouTubeChannelHelper 2023-03-30 13:23:44 +02:00
ThetaDev 97d7ee5663 Merge branch 'dev' of github.com:Theta-Dev/NewPipeExtractor into channel-tabs 2023-03-30 13:08:12 +02:00
ThetaDev 8d1303e18f
Add track types to audio streams (#1041) 2023-03-28 00:02:20 +02:00
ThetaDev c6ee2f3ff7 fix: add checkIfChannelResponseIsValid method 2023-03-22 01:27:55 +01:00
ThetaDev 8ecee8737c fix: channel extractor tests, docs 2023-03-22 01:00:05 +01:00
ThetaDev 9cebcf7ab6 Merge branch 'dev' of github.com:TeamNewPipe/NewPipeExtractor into channel-tabs 2023-03-21 00:45:53 +01:00
AudricV 1556adbb2d
[YouTube] Fix hashtags links extraction and escape text in attribute descriptions + HTML links
webCommandMetadata object is contained inside a commandMetadata one, so it is
not accessible from the root of the navigationEndpoint object.

The corresponding statement has been moved at the bottom of the specific
endpoints parsing, as the webCommandMetadata object is present almost
everywhere, otherwise URLs of some endpoints would have be changed, such as
uploader URLs (from channel IDs to handles).

As no ParsingException is now thrown by getUrlFromNavigationEndpoint, and so by
getTextFromObject, getUrlFromObject and getTextAtKey, the methods which were
catching ParsingExceptions thrown by these methods had to be updated.

URLs got in the HTML version of getTextFromObject are now escaped properly to
provide valid HTML to clients. This has been also done for attribute
descriptions, with the description text for this type of descriptions.

As YouTube descriptions are in HTML format (except for the fallback on the JSON
player response, which is plain text and only happens when there is no visual
metadata or a breaking change), all URLs returned are escaped, so tests which
are testing presence of URLs with escaped characters had to be updated (it was
only the case for YoutubeStreamExtractorDefaultTest.DescriptionTestUnboxing).
2023-02-26 18:43:36 +01:00
TobiGr 3f7df9536e [YouTube] Fix getting the comment text if the comment contains a hashtag 2023-01-29 20:33:51 +01:00
Stypox 7293991832
[YouTube] Now music mixes can be treated as normal mixes
Using a playlist extractor on them would result in "Unviewable playlist" errors
2023-01-15 23:28:59 +01:00
TobiGr 56aab4d971 [YouTube] Fix escaping links in YouTubeParsingHelper.getTextFromObject 2023-01-05 00:28:12 +01:00
Stypox 45636b0d00
Merge pull request #986 from Isira-Seneviratne/Static_maps
Use immutable Map factory methods.
2023-01-02 18:11:14 +01:00
Stypox 219c5c5be5
Update extractor/src/main/java/org/schabi/newpipe/extractor/services/youtube/YoutubeParsingHelper.java 2023-01-02 18:11:03 +01:00
Isira Seneviratne d8ce08d969 Use immutable Map factory methods. 2023-01-02 07:50:31 +05:30
Kavin 01acf79436
Fix for potential XSS attacks. 2022-12-31 20:05:32 +00:00
AudricV d5437e0bc5
Merge pull request #863 from AudricV/add-content-type-and-content-length-headers-to-post-requests
Add Content-Type header to all POST requests without an empty body
2022-12-16 19:32:56 +01:00
ThetaDev c156c404cb Merge branch 'dev' of github.com:TeamNewPipe/NewPipeExtractor into channel-tabs 2022-11-29 17:50:32 +01:00
ThetaDev ffd02a4bc8 fix: shorts continuation 2022-11-29 17:50:14 +01:00
Kavin 52fda37915
Implement bold/italic/strike-through support. 2022-11-28 19:06:18 +00:00
AudricV 3891542ca1
Use Downloader's postWithContentType and postWithContentTypeJson methods in services and extractors 2022-11-22 11:37:18 +01:00
AudricV e9a0d3bd95
[YouTube] Send Content-Type header in all POST requests
This header was not sent partially before and was added and guessed by OkHttp. This can create issues when using other HTTP clients than OkHttp, such as Cronet.

Some code in the modified classes has been improved and / or deduplicated, and usages of the UTF_8 constant of the Utils class has been replaced by StandardCharsets.UTF_8 where possible.

Note that this header has been not added in except in YoutubeDashManifestCreatorsUtils, as an empty body is sent in the POST requests made by this class.
2022-11-22 11:37:16 +01:00
ThetaDev f7e3b713b5 Merge branch 'dev' into channel-tabs 2022-11-22 02:38:03 +01:00
ThetaDev 8d3bc2bc4b fix: YoutubeParsingHelper formatting 2022-11-22 01:59:51 +01:00
Tobi 2211a24b69
Merge pull request #971 from lrusso96/patch-1
[YouTube] Improve duration parsing
2022-11-16 16:14:54 +01:00
Isira Seneviratne ddbce3b83d Add Utils methods for URL encoding/decoding using UTF-8. 2022-11-12 07:29:15 +05:30
Isira Seneviratne 366f5c1632 Use StandardCharsets.UTF_8. 2022-11-12 07:29:15 +05:30
Luigi Russo c9635218e2
[YouTube] Improve duration parsing 2022-11-09 09:41:29 +01:00
Isira Seneviratne 316d8573fa Use immutable sets in YoutubeParsingHelper. 2022-11-07 07:50:26 +05:30
ThetaDev 73c182f817 Merge branch 'dev' of github.com:TeamNewPipe/NewPipeExtractor into channel-tabs 2022-11-04 23:50:04 +01:00
ThetaDev f71fdac166 refactor: API changes 2022-11-04 23:47:44 +01:00
ThetaDev 592e1d6386 fix: parsing attributed description with no command runs 2022-11-03 12:10:52 +01:00
ThetaDev 099b53cc4f
[YouTube] Add parser for attributedDescription
Also update the mock of the next InnerTube endpoint response of the
YoutubeStreamExtractorDefaultTest.DescriptionTestUnboxing test class with an
attributedDescription instead of a regular description
2022-11-02 23:11:33 +01:00
Kavin 6a256d0631
Add uploader url and verified to PlaylistInfoItem. 2022-10-30 13:00:19 +00:00
ThetaDev 12537733c1 fix: store YouTube visitor data for channel tabs 2022-10-25 09:20:18 +02:00
ThetaDev 57865e2195 feat: add visitor data config option 2022-10-23 21:57:15 +02:00
ThetaDev 8b4b4310ea feat: add tab support to channel extractor
- extract YouTube channel tabs: playlists, channels, shorts, live
2022-10-22 15:29:35 +02:00
Isira Seneviratne 943b7c033b Remove EMPTY_STRING. 2022-08-24 06:59:17 +05:30
litetex 8ff7a90f52 Improved consent cookie related constants and documentation 2022-08-21 18:41:40 +02:00
litetex ecfc370685 Fixed all YTMixPlaylists
Added option to choose if you want to consent or not - currently this is done by a static variable in ``YoutubeParsingHelper`` - may not be the best long-term solution but for now the tests work again (in EU countries) 🥳
2022-08-14 14:48:27 +02:00
AudricV c82317e318
[YouTube] Spoof more mobile clients
Additional parameters have been added to the player requests of ANDROID and IOS
clients:

- for both clients: osName and osVersion: their respective values are:
  - for the ANDROID one: Android and 12;
  - for the IOS one: iOS and 15.6.0.19G71.
- for the ANDROID client: androidTargetSdkVersion, with the Android SDK version
  corresponding to the Android version used in the player requests of this
  client. This parameter is now required with this client to be sure to get a
  correct player response, otherwise, the one of a video saying that this
  content is not available in this app and to watch it with the latest version
  of YouTube can be returned instead;
- for the IOS client: deviceMake, with Apple as its value.

The iOS version sent in the IOS client player requests has been also updated to
the version 15.6 of the OS.

Finally, a comment about the requirement to use the signature timestamp from
the player JavaScript base file for HTML5 player requests on videos with
obfuscated URLs has been added and replaces a previous one which may be not
true.
2022-08-12 19:20:31 +02:00
AudricV d0549a5a52
[YouTube] Update client versions and use a real version for the iOS client
The iOS version can be got easily in fact, by looking at the What's New section of the App Store' app page.
2022-08-12 19:20:31 +02:00
AudricV d7e678aca2
[YouTube] Improve WEB client version and API key HTML extraction
Common code in WEB client version HTML extraction has been deduplicated, usage of the Java 8 Stream API has been made and initial data fallback has been used as a last resort.
This means that the client version extraction from regexes will be used before this fallback, as it doesn't contain the full client version.
This can be used as a way to fingerprint the extractor, even if it seems to be not the case.
2022-08-12 19:20:30 +02:00
Isira Seneviratne 1af6b8eedb Use Collections.singletonList(). 2022-07-27 07:35:57 +05:30
Isira Seneviratne ff60e05c76 Use Collections.singletonMap(). 2022-07-27 07:35:57 +05:30
TiA4f8R f17f7b9842
Apply requested changes in YoutubeParsingHelper 2022-05-28 12:00:55 +02:00
Stypox 50272db946
Apply reviews: improve comments, remove FILE, remove Stream#equals(Stream) 2022-05-28 12:00:49 +02:00
TiA4f8R aa4c10e751
Improve documentation and adress most of the requested changes
Also fix some issues in several places, in the code and the documentation.
2022-05-28 12:00:46 +02:00
TiA4f8R a857684442
Apply changes in YoutubeStreamExtractor
Extract post live DVR streams as post live streams instead of live streams.

A new class has been in order to improve code: ItagInfo, which stores an itag, the content (URL) extracted and if its an URL or not.
A functional interface has been added in order to abstract the stream building: StreamBuilderHelper.
Also add the cver parameter added by the desktop web client on the corresponding streams (a new method has been added in YoutubeParsingHelper to check this and another for Android streams).

Some code in these classes has been also refactored/improved/optimized.
2022-05-28 12:00:44 +02:00
TiA4f8R c34b5e3a8b
[YouTube] Fix extraction of YouTube Music client version and API key when using YouTube Music's website in EU
Google returns now the consent page of YouTube for YouTube Music in EU, which can be also avoided by adding the ucbcb parameter to the URL with the value 1 ("?ucbcb=1").
2022-05-15 11:20:06 +02:00