NewPipeExtractor

Commit Graph

Author	SHA1	Message	Date
Stypox	6589e2c15d	Merge pull request #1148 from Stypox/mediaccc-channel-tab-handler [MediaCCC] Allow obtaining channel tab link handler	2024-03-28 13:45:05 +01:00
Stypox	c57016b79b	Make getCommentText @Nonnull	2024-03-27 15:26:06 +01:00
TobiGr	aaccfecda8	[YouTube] Detect new account termination messages	2024-03-20 14:57:41 +01:00
Stypox	aaf3231fc7	[MediaCCC] Fix lambda link handler keeping reference to extractor This caused problems in NewPipe, because extractors are not serializable, and well, keeping references to them is a bad idea anyway.	2023-12-30 23:23:19 +01:00
Stypox	cc9ade962e	[MediaCCC] Allow obtaining channel tab extractor from scratch i.e. without needing to pass through the conference/channel extractor This was needed because clients (like NewPipe) might rely on link handlers to hold as little data as possible, since they might be kept around for long or passed around in system transactions, so this commit allows obtaining a standalone link handler that does not hold a JsonObject within itself.	2023-12-30 22:53:27 +01:00
petlyh	4408e2d0ac	[YouTube] Add Albums channel tab	2023-12-30 14:01:30 +01:00
petlyh	2b2c1546d1	Avoid PeerTube accepting non-URLs	2023-12-29 12:27:39 +01:00
Tobi	1e93b1dc20	Merge pull request #1135 from Stypox/yt-emergency-info [YouTube] Implement emergency meta info	2023-12-29 12:01:40 +01:00
dragfyre	65e7bc5b95	Update PeertubeTrendingLinkHandlerFactory.java correcting Peertube local trending api URL (per #10685 in main NewPipe repo); see https://docs.joinpeertube.org/api-rest-reference.html#tag/Video/operation/getVideos	2023-12-28 14:50:31 +07:00
Stypox	5b59a1a8c5	[YouTube] Move meta info extraction to separate file YoutubeParsingHelper was longer than 2000 lines which caused checkstyle issues	2023-12-21 21:19:08 +01:00
Stypox	b8e12dd76c	[YouTube] Implement emergency meta info YouTube provides that meta info panel when users search for really sensitive content like suicide (e.g. "blue whale"). It contains: - an encouragement as title (e.g. "We are with you") - a phone number as action - details about how to call the phone number (e.g. availability) - an url pointing to the website of an association Also add a test that just checks if a meta info is properly extracted	2023-12-21 21:19:08 +01:00
Stypox	2938067c2c	[YouTube] Shorts don't provide a duration anymore	2023-12-21 20:41:01 +01:00
AudricV	56ab35423e	[YouTube] Fix potential NullPointerException in YoutubeSearchExtractor.getSearchSuggestion	2023-12-08 21:46:48 +01:00
AudricV	6ba8251be1	[YouTube] Bypass crisis resources blocking search results These crisis resources are preventing search results to be returned. See https://support.google.com/youtube/answer/10726080?hl=en for more info on them. This commit changes search parameters to include the property allowing to show search results.	2023-12-08 21:46:47 +01:00
AudricV	7dea2d0d27	[YouTube] Remove Channels channel tab support This tab has been removed by YouTube.	2023-12-08 21:46:47 +01:00
AudricV	3782d9a02a	[YouTube] Support new A/B tested like data and avoid like count conversion from integer to long Also make minor improvements to current like data extraction and remove previous like count data support, as it is not returned anymore.	2023-12-08 21:46:46 +01:00
AudricV	b71ce1123f	[YouTube] Extract only search results corresponding to a search type YouTube returns sometimes videos inside channel search results. As we only want results corresponding to the type we requested, this commits makes YoutubeSearchExtractor ignoring non-requested search results we get, using the extractor LinkHandler's first content filter value. Also remove an unneeded exception throwing declaration in YoutubeSearchExtractor.	2023-12-08 21:46:46 +01:00
AudricV	ff8ed7247f	[YouTube] Switch to new consent cookie Also move the documentation of the consent in its setter method in order to be accessible publicly and improve it.	2023-12-08 21:46:46 +01:00
AudricV	2c941794c0	[YouTube] Add utcOffsetMinutes to all InnerTube payloads This should make returned dates consistent between timezones and countries on which the extractor is ran. It was previously only set on YouTube Music search continuations.	2023-12-08 21:46:46 +01:00
AudricV	d97c9e0db1	[YouTube] Improve payloads and URLs of InnerTube requests For every InnerTube request: - Always add a `request` object with the following properties: - "internalExperimentFlags" set to an empty array; - "useSsl" set to "true"; - "lockedSafetyMode" set to "false". - Use proper TODO comment to provide a way to enable restricted mode on every request and add it on requests on which it wasn't present. For YouTube Music: - Remove alt query parameter, as it is not used anymore by the website; - Add prettyPrint query parameter with false value on YouTube Music search continuations.	2023-12-08 21:46:45 +01:00
AudricV	8a9ebcc373	[YouTube] Update InnerTube clients' version and devices' OS version and model	2023-12-08 21:46:45 +01:00
FineFindus	34b05a0dda	feat(youtube/comments): support creator replies	2023-10-09 16:33:43 +02:00
FineFindus	c1784a4bdb	[YouTube] Add channel owner to comments	2023-10-09 16:33:43 +02:00
Tobi	d6f5cba6e2	Merge pull request #1111 from FineFindus/feat/creator-reply Add `hasCreatorReply()` to CommentsInfoItem	2023-10-09 12:45:56 +02:00
TobiGr	d49f8411d7	[PeerTube] Implement CommentsInfoItemExtractor.hasCreatorReply()	2023-10-09 02:47:12 +02:00
AudricV	c98695fcea	[SoundCloud] Fix extraction of non-JPG images Default image qualities were removed in image URLs with the jpg extension, causing the addition of the image suffix to full non-JPG images URLs and so to invalid image URLs. Only the image quality name with its leading "-" character and the "." character after the name is now removed and replaced by a string format replaced itself with the image quality name for each quality. As the image suffixes do not contain the image extension, the name of image qualities lists has been adapted with these changes and some related comments have been also improved.	2023-10-01 20:33:25 +02:00
FineFindus	dd7b2d9798	feat(youtube/comments): support creator replies	2023-09-25 10:40:45 +02:00
Youssif Shaaban Alsager	917554acc4	[YouTube] Add support for ultralow audio formats (#1063 )	2023-09-24 19:04:34 +02:00
Christian	fc67d49f59	Update copyright notices Update copyright notices to comply to GPLv3 and change NewPipe to NewPipe Extractor on some notices that were not updated.	2023-09-22 19:10:15 -03:00
AudricV	714b141ecb	[YouTube] Catch any exception when extracting something from JavaScript's base player	2023-09-21 21:59:33 +02:00
AudricV	588c6a8422	[YouTube] Quote signature deobfuscation function name and add semicolon only where needed	2023-09-21 21:59:33 +02:00
AudricV	a04bc320de	[YouTube] Convert signature timestamp to integer The signature timestamp is used as a number by HTML5 clients, so it should be used in the same way by the extractor too instead of being a string. As the timestamp doesn't seem to exceed 5 digits, an integer is used to store its value.	2023-09-21 21:59:32 +02:00
AudricV	7de3753a81	[YouTube] Refactor JavaScript player management API This commit is introducing breaking changes. For clients, everything is managed in a new class called YoutubeJavaScriptPlayerManager: - caching JavaScript base player code and its extracted code (functions and variables); - getting player signature timestamp; - getting deobfuscated signatures of streaming URLs; - getting streaming URLs with a throttling parameter deobfuscated, if applicable. The class delegates the extraction parts to external package-private classes: - YoutubeJavaScriptExtractor, to extract and download YouTube's JavaScript base player code: it always already present before and has been edited to mainly remove the previous caching system and made it package-private; - YoutubeSignatureUtils, for player signature timestamp and signature deobfuscation function of streaming URLs, added in a recent commit; - YoutubeThrottlingParameterUtils, which was originally YoutubeThrottlingDecrypter, for throttling parameter of streaming URLs deobfuscation function and checking whether this parameter is in a streaming URL. YoutubeJavaScriptPlayerManager caches and then runs the extracted code if it has been executed successfully. The cache system of throttling parameters deobfuscated values has been kept, its size can be get using the getThrottlingParametersCacheSize method and can be cleared independently using the clearThrottlingParametersCache method. If an exception occurs during the extraction or the parsing of a function property which is not related to JavaScript base player code fetching, it is stored until caches are cleared, making subsequent failing extraction calls of the requested function or property faster and consuming less resources, as the result should be the same until the base player code changes. All caches can be reset using the clearAllCaches method of YoutubeJavaScriptPlayerManager. Classes using JavaScript base player code and utilities directly (in the code and its tests) have been also updated in this commit.	2023-09-21 21:59:32 +02:00
AudricV	6884d191cd	[YouTube] Add utility class around signatures and fix signature deobfuscation function extraction The goal of this class is to decouple the extraction of signature timestamp and signature deobfuscation function from YoutubeStreamExtractor. The extraction of the signature deobfuscation function has been also adapted to support the latest YouTube player versions. This new class, YoutubeSignatureUtils, doens't store anything temporary such as a copy of the player code, which has to be passed where required. It is not public, as it will be used by a JavaScript player manager class in the future, in order to handle in a better way fetching, caching and resetting cache of the player code.	2023-09-21 21:59:26 +02:00
AudricV	e16d521b7b	[MediaCCC] Apply changes in Extractors Also remove usage of the conference logo as the banner of a conference, as it is a logo and not a banner.	2023-08-12 22:56:30 +02:00
AudricV	306068a63b	[MediaCCC] Apply changes in InfoItemExtractors	2023-08-12 22:56:30 +02:00
AudricV	2f40861428	[MediaCCC] Add utility methods to get image lists from conference logos and streams These three new methods, added in MediaCCCParsingHelper, getImageListFromImageUrl(String), getThumbnailsFromStreamItem(JsonObject) and getThumbnailsFromLiveStreamItem(JsonObject) (the last two are based on a common method, getThumbnailsFromObject(JsonObject, String, String)), return an empty list if the case no image URL could be extracted. Images returned have their height and width unknown and a resolution level depending on the image key of the JSON API response.	2023-08-12 22:56:30 +02:00
AudricV	71cda03c4c	[Bandcamp] Apply changes in Extractors	2023-08-12 22:56:29 +02:00
AudricV	7e01eaac33	[Bandcamp] Apply changes in InfoItemExtractors	2023-08-12 22:56:29 +02:00
AudricV	4b80d737a4	[Bandcamp] Add utility methods to get multiple images Bandcamp images work with image IDs, which provide different resolutions. Images on Bandcamp are not always squares, and some IDs respect aspect ratios where some others not. The extractor will only use the ones which preserve aspect ratio and will not provide original images, for performance and size purposes. Because of this aspect ratio preservation constraint, only one dimension will be known at a time. The image IDs with their respective dimension used are: - 10: 1200w; - 101: 90h; - 170: 422h; - 171: 646h; - 20: 1024w; - 200: 420h; - 201: 280h; - 202: 140h; - 204: 360h; - 205: 240h; - 206: 180h; - 207: 120h; - 43: 100h; - 44: 200h. (Where w represents the width of the image and h the height of the image) Note that these dimensions are theoretical because if the image size is less than the dimensions of the image ID, it will be not upscaled but kept to its original size. All these resolutions are stored in a private static list of ThumbnailSuffixes in BandcampExtractorHelper, in which the methods to get mutliple images have been added: - getImagesFromImageUrl(String): public method to get images from an image URL; - getImagesFromImageId(long, boolean): public method to get images from an image ID; - getImagesFromImageBaseUrl(String): private utility method to get images from the static list of ThumbnailSuffixes from a given image base URL, containing the path to the image, a "a" letter if it comes from an album, its ID and an underscore. Some existing methods have been also edited: - the documentation of getImageUrl(long, boolean) has been changed to reflect the Bandcamp images findings; - getThumbnailUrlFromSearchResult has been renamed to getImagesFromSearchResult, and a documentation has been added to this method. The method replaceHttpWithHttps of the Utils class has been also used in BandcampExtractorHelper instead of doing manually what the method does.	2023-08-12 22:56:29 +02:00
AudricV	4e6fb368bc	[PeerTube] Apply changes in Extractors and remove usages of default avatar picture The default avatar picture was used when no profile picture was found, but it was removed and split in multiple images. Thumbnails' size is not known, as this data is not provided by the API.	2023-08-12 22:56:29 +02:00
AudricV	0a6011a50e	[PeerTube] Apply changes in InfoItemExtractors Also lower the visibility of attributes of channels and playlists InfoItems to private.	2023-08-12 22:56:29 +02:00
AudricV	6f8331524b	[PeerTube] Add utility method to get thumbnails of playlists and videos This method, getThumbnailsFromPlaylistOrVideoItem, has been added in PeertubeParsingHelper and returns the two image variants for playlists and videos.	2023-08-12 22:56:28 +02:00
AudricV	81c0d80a54	[PeerTube] Add utility methods to get avatars and banners of accounts and channels Four new static methods have been added in PeertubeParsingHelper to do so: - two public methods to get the corresponding image type: getAvatarsFromOwnerAccountOrVideoChannelObject(String, JsonObject) and getBannersFromAccountOrVideoChannelObject(String, JsonObject); - two private methods as helper methods: getImagesFromAvatarsOrBanners(String, JsonObject, String, String) and getImagesFromAvatarOrBannerArray(String, JsonArray).	2023-08-12 22:56:28 +02:00
AudricV	31da5beb51	[SoundCloud] Apply changes in Extractors	2023-08-12 22:56:28 +02:00
AudricV	a3a74cd566	[SoundCloud] Apply changes in InfoItemExtractors and return track user avatars as uploader avatars in SoundcloudStreamInfoItemExtractor	2023-08-12 22:56:28 +02:00
AudricV	7f818217d2	[SoundCloud] Add utility methods to get images from track JSON objects and image URLs These new public and static methods, added in SoundcloudParsingHelper, getAllImagesFromArtworkOrAvatarUrl(String) and getAllImagesFromVisualUrl(String) (which call a common private method, getAllImagesFromImageUrlReturned(String, List<ImageSuffix>, List<Image>)), return an unmodifiable list of JPEG images containing almost every image resolution provided by SoundCloud except the original size and the tiny resolution (for artworks and avatars, as the image size is 20x20 for artworks and 18x18 for avatars, so very close to or equal to the t20x20 resolution): - for artworks and avatars: - mini: 16x16; - t20x20: 20x20; - small: 32x32; - badge: 47x47; - t50x50: 50x50; - t60x60: 60x60; - t67x67: 67x67; - large: 100x100; - t120x120: 120x120; - t200x200: 200x200; - t240x240: 240x240; - t250x250: 250x250; - t300x300: 300x300; - t500x500: 500x500. - for visuals/user banners: - t1240x260: 1240x260; - t2480x520: 2480x520. Duplicated code in two methods of SoundcloudParsingHelper (getUsersFromApi(ChannelInfoItemsCollector, String) and getStreamsFromApi(StreamInfoItemsCollector, String, boolean)) has been merged into one common private method, getNextPageUrlFromResponseObject(JsonObject).	2023-08-12 22:56:28 +02:00
AudricV	266cd1f76b	[YouTube] Apply changes in YoutubeMusicSearchExtractor and split its InfoItemExtractors into separate classes Splitting YoutubeMusicSearchExtractor's InfoItemExtractors into separate classes (YoutubeMusicSongOrVideoInfoItemExtractor, YoutubeMusicAlbumOrPlaylistInfoItemExtractor and YoutubeMusicArtistInfoItemExtractor) allows to simplify YoutubeMusicSearchExtractor,improves reading and applying changes to InfoItems (no more losing at least quarter of a line due to indentations). These InfoItems, in which the image changes have been applied, don't extend the YouTube ones anymore, as most methods were overridden and the few ones that are not don't apply in YouTube Music items responses, so it was useless to extend them. The code of YoutubeMusicSearchExtractor have been also improved a bit.	2023-08-12 22:56:27 +02:00
AudricV	c1981ed54f	[YouTube] Apply changes in Extractors except YoutubeMusicSearchExtractor Also improve a bit some code related to the changes.	2023-08-12 22:56:27 +02:00
AudricV	4cc99f9ce1	[YouTube] Apply changes in InfoItemExtractors except YouTube Music ones	2023-08-12 22:56:27 +02:00

1 2 3 4 5 ...

1324 Commits